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Abstract 

In  recent  years  application  of  a discrete  wavelet  transform  (DWT)  has  become  an  estab- 
lished tool  for  the  design  of  preconditioners  for  smooth,  dense  matrices,  such  as  those 
that  arise  in  the  solution  of  certain  integral  equations.  In  this  paper  we  consider  the 
higher  dimensional  case,  where  the  matrix  A is  not  itself  smooth,  but  has  a smooth  block 
structure.  To  precondition  such  matrices,  we  use  repeated  application  of  a level  1 block- 
wise  DWT  to  exploit  the  fact  that  corresponding  entries  in  adjacent  blocks  are  close  in 
value.  We  illustrate  the  effectiveness  of  our  methods  by  means  of  numerical  examples. 

1 Introduction 

We  have  previously  ([9])  considered  wavelet- based  preconditioning  methods  for  dense 
matrices  having  the  property  that  the  entries  vary  smoothly  (that  is  to  say,  adjacent 
entries  are  close  in  value)  apart  from  known  areas  of  singularity,  for  example  a non- 
smooth diagonal  band.  The  main  idea  is  to  use  wavelet  compression  (see,  for  example 
[14])  to  convert  “smoothness”  in  the  original  matrix  into  “smallness”  in  the  transformed 
matrix,  and  then  to  approximate  the  transformed  matrix  by  dropping  small  entries. 
Smooth  matrices  arise  in  a range  of  applications  (see,  for  example,  [6,  8,  10])  involving 
an  essentially  1-dimensional  discretization  process.  In  higher  dimension  cases  the  corres- 
ponding matrices  have  a block  structure:  each  block  is  smooth  and  corresponding  entries 
in  different  blocks  vary  smoothly;  but  discontinuities  at  the  edges  of  the  blocks  mean  that 
standard  application  of  DWT  does  not  give  good  compression.  In  this  paper  we  extend 
the  ideas  of  [9]  to  enable  preconditioners  to  be  designed  for  such  matrices.  Throughout 
we  use  Daubechies  wavelets,  which  are  orthogonal  and  have  compact  support. 

2 DWT-based  preconditioners 

We  are  interested  in  fast  solution  of  linear  systems 

Ax  = b,  x,be  Cn,  A e C”xn,  (2.1) 

where  A is  a large,  dense  matrix.  Krylov  subspace  iterative  methods,  such  as  GMRES 
(described  in  [13]),  can  be  used  to  solve  (2.1),  but  in  most  cases  preconditioning  is 
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required  in  order  to  obtain  good  convergence.  One  method  of  preconditioning  is  to  seek 
a matrix  M ps  A such  that  M~1v  can  be  calculated  cheaply  for  any  vector  v.  For  smooth 
dense  A the  task  is  usually  made  easier  by  transforming  (2.1)  into  a wavelet  basis  (see 
e.g.  [4,  5,  6,  10,  11]).  When  a DWT  is  applied  to  such  an  A,  the  resulting  matrix  A has 
many  small  elements.  A sparse  A « A can  be  obtained  by  setting  to  zero  small  elements. 
This  is  the  main  idea  underlying  most  wavelet-based  preconditioners. 

2.1  Preconditioners  for  1-D  problems 

Typically  A is  smooth  apart  from  a narrow  diagonal  band.  When  a level  k standard  DWT 
is  applied  A has  a ‘finger’  pattern  of  large  entries  (caused  by  the  non-smooth  diagonal 
feature)  and  an  n/2fc  x n/2k  block  of  large  entries  at  the  top-left  corner.  Here  n should 
be  a power  of  2.  We  can  form  a preconditioner  M « A by  setting  to  zero  entries  that  fall 
below  some  chosen  threshold,  but,  because  of  the  finger  pattern,  a large  amount  of  fill-in 
occurs  under  LU  factorization.  To  avoid  this  problem  M can  be  obtained  by  setting  to 
zero  entries  in  A that  fall  outside  of  a diagonal  band.  We  describe  this  approach  as  a 
“band  cut” . 

The  finger  pattern  can  be  avoided  by  using  DWTPer  (DWT  with  permutations,  first 
proposed  in  [6],  see  also  [7]),  which  centres  the  fingers  to  form  a sparse  diagonal  band 
whose  width  can  be  predicted  accurately.  M can  then  be  formed  by  applying  a band  cut 
to  A and  (optionally)  imposing  a threshold. 

An  alternative  way  of  avoiding  the  creation  of  a finger  pattern  matrix  is  to  use  the 
Non-Standard-forms  (NS-forms)  of  Beylkin,  Coifman  and  Rokhlin  (see  [3])  to  represent 
A in  terms  of  the  blocks  of  a larger  matrix.  In  [9]  we  presented  a new  way  of  using  the 
NS-form  submatrices  to  precondition  A based  on  the  Schur  complement  and  recursive 
application  of  a flexible  GMRES  iteration.  We  compared  four  alternative  DWT-based 
preconditioning  methods: 

PI  standard  DWT  preconditioner  with  band  cut  ([5]), 

P2  DWTPer  preconditioner  with  band  cut  ([6,  10]), 

P3  NS-form  preconditioner  with  threshold  ([3,  11]), 

P4  Recursive  Schur  complement  preconditioner  ([9]), 

and  found  that,  for  smooth  matrices  with  a diagonal  singularity,  P4  gave  consistently 
good  performance,  PI  performed  well  for  moderate  singularities  and  P2  was  best  when 
the  diagonal  singularity  was  very  pronounced.  When  we  came  to  consider  2-D  problems, 
the  robustness  of  P4  encouraged  us  to  consider  ways  of  extending  it  to  higher  dimensions. 

2.2  Extension  to  matrices  with  block  structure 

In  the  2-dimensional  case  we  are  concerned  with  matrices  that  have  a smooth  block 
structure.  We  can  compress  dense  block  matrices  of  this  type  using  two  different  types 
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of  DWT:  The  block  DWT  has  a transform  matrix  of  the  form 

/ 0 

■ivLm,n)  = im  ® W 0 w(n)  ’’ 


where  W1'71'1  is  a standard  n x n DWT  matrix  and  0 is  the  n x n zero  matrix.  It  exploits 
smoothness  within  blocks.  The  Big  Block  DWT  (BBDWT)  exploits  smoothness 
between  blocks.  It  has  a transform  matrix  of  the  form 
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where  ho,...  , hp-i  and  go,--  - ,9d~ i are  the  low-pass  and  high-pass  filter  coefficients 
respectively  ( D being  the  order  of  the  wavelet  transform),  / is  the  nxn  identity  matrix 
and  0 is  the  nxn  zero  matrix.  The  resulting  transformed  matrix  has  a ‘finger’  structure  of 
blocks,  each  with  a diagonal  structure.  We  can  avoid  the  finger  pattern  by  permuting  the 
rows  and  columns  of  the  transformed  matrix  so  as  to  centre  the  blocks  containing  large 
entries.  We  call  this  modified  big  block  transform  BBDWTPer,  because  it  is  a big  block 
version  of  the  DWTPer  transform  described  in  [10].  We  anticipate  that  BBDWTPer 
may  be  useful  for  preconditioning  block  matrices  with  a very  strong  block  diagonal 
singularity  (see  the  comparison  of  DWTPer  and  other  DWT-based  preconditioners  in 
[9]),  but  we  have  not  yet  found  example  matrices  for  which  BBDWTPer  provides  a good 
preconditioner.  Preconditioners  based  on  BBDWT  and  BBDWTPer  are  tested  in  Section 
4;  we  now  present  a more  effective  method. 

3 Recursive  BBDWT-based  preconditioning 

An  alternative  way  of  avoiding  the  ‘finger’  pattern  is  to  use  a ‘Big  Block’  version  of  the 
NS-forms  presented  in  [3].  We  define  the  Big  Block  NS-form  (BBNS-form)  of  a matrix 
as  follows.  To  transform  a matrix  consisting  of  m2  blocks,  each  of  dimension  n (where  m 
and  n are  powers  of  2)  we  define  P,,  Q,  to  be  the  mn/2'  x mnj 2!_1  matrices  such  that 

-(g).  (3.1) 
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Given  an  mn  x mn  matrix  A,  define  T0  = A, 

Ti  = PiTi-xP? , At  = QiTi~1QiT,  Bi  = , Ci  = PjT^1QiT,  (3.2) 

Mt;  Sr.1)-  (33) 

The  level  k BBNS-form  of  A comprises  Tk  together  with  Ai,Bi,Ci,  i = 1,2, ...  ,k.  (The 
blocks  of  Ti  are  arranged  differently  from  those  of  the  standard  level  1 DWT  of  T,.  We 
have  used  this  ordering  in  order  to  be  consistent  with  the  notation  of  [3].j 

We  propose  to  use  banded  approximations  to  the  submatrices  of  the  BBNS-form  as 
the  basis  for  our  preconditioner.  If  the  blocks  of  A vary  smoothly  apart  from  a diagonal 
block  band,  then  each  of  Ai+ 1,  Bl+i,  Ci+i  will  have  small  entries  except  for  a wrap- 
round  diagonal  block  band.  So  we  can  approximate  them  by  Ai+i,  Bt+\ , C,;+i,  formed 
by  cutting  to  a block  band,  giving  an  approximation  Ti  to  T): 

?;«)■  <3-4> 

(In  practice,  it  is  unnecessary  to  compute  A,+ 1,  Bi+ 1,  C,+1  and  then  to  set  entries  outside 
the  block  band  to  zero;  instead  we  can  compute  only  the  non-zero  entries  of  Ai+\ , Bi+i , 
Ci+1.  This  enables  us  to  reduce  the  cost  of  forming  T).) 

We  now  show  how  this  can  help  us  to  solve  (2.1).  We  use  a flexible  GMRES  iteration 
(see  [12])  preconditioned  by  approximate  solution  of  an  equation  of  the  form  Ay  = v at 
each  step.  To  do  this  we  first  apply  a level  1 BBDWT  with  a block  band  cut  to  give 


where  y\  = Qiy,  j/2  = Piy,j>i  = Qiv,  V2  = Piv.  We  solve  this  equation  using  the  Schur 
complement  Si  = T)  — CiA^Bi.  This  requires  us  to  solve  an  equation  of  the  form 

SiV2  =W2,  (3.6) 

which  we  do  by  a further  GMRES  iteration.  We  expect  that  Ti  will  be  an  effective 
preconditioner  for  Si  (see  [1]§9.3),  so  we  now  seek  a cheap  way  of  applying  T-f1  to  a 
vector.  To  do  this  we  repeat  the  process  of  applying  a level  1 BBDWT  and  using  the 
Schur  complement.  In  summary,  during  the  solution  of  (3.6)  we  solve  a preconditioning 
equation  of  the  form 

Tiy  = v,  y,veCmn/2.  (3.7) 

To  do  this  cheaply  we  repeat  the  process  of  applying  a level  1 BBDWT  and  using  the 
Schur  complement  and  obtain  an  equation  of  the  form 

S2z  = w,  z,weCmn !A.  (3.8) 

This  in  turn  can  be  solved  using  flexible  GMRES  preconditioned  by  T2.  We  continue 
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recursively,  solving  equations  of  the  form 

SiZ  = w,  z,w  6 Cm"/2‘ 

iteratively,  preconditioning  by  solving  equations  of  the  form 

TiV  = v,  y,v  G C'"”-/2\ 


(3.10) 


until  the  matrix  T,  is  small  enough  that  T~l  can  be  applied  directly  by  means  of  LU 
factorization  at  low  cost.  Therefore,  at  level  i,  each  GMRES  iteration  requires  a pre- 
conditioning step  that  in  turn  calls  for  iterative  solution  by  GMRES  of  a coarser  level 
equation.  At  the  coarsest  level  the  preconditioner  is  applied  directly  using  an  LU  factor- 
ization of  Ti+i.  This  process  is  summarized  in  Algorithm  3.1. 

Algorithm  3.1  Approximate  solution  ofFy  = v. 

(1)  Compute  vi  — Qi+iv,  v2  = Pi+iv. 

(2)  Solve  Ai+1wi  = Vi  for  wi. 

(3)  Set  ui2  =v2-  Ci+  iuq. 

(4)  Define  Si+1  = Ti+l  - Ci+iA~^Bi+1. 

(5)  Solve  Si+1y2  = w2  for  y2  by  flexible  GMRES  iteration,  preconditioning  with  Ti+ 1, 
using  Algorithm  3.1  if  i + 1 < k and  using  matrices  Li+ 1,  Ui+i  otherwise. 

(6)  Set  yi=wi-  A~^Bi+ xy2. 


(7)  Set  y : 


V P& in 


To  solve  equation  (2.1),  we  start  the  solution  process  for  level  i = 0 and  apply  a 
GMRES  iteration  with  the  preconditioner  T\  to  the  Schur  complement  of  the  transformed 
T0  = A.  The  overall  method  is  presented  in  Algorithm  3.2. 

Algorithm  3.2  Solution  of  Ax  — b by  recursively  preconditioned  flexible  GMRES. 

(1)  Set  up 

(a)  Input  matrix  A,  vector  b,  tolerance  t. 

(b)  Decide  on  values  for: 

• maximum  wavelet  level,  k, 

• tolerance  ii  for  inner  iterations, 

• block  band  width  for  approximating  the  submatrices. 

(c)  Set  Tq  = A and  i = 0. 

(d)  Recursively,  for  i = 1 . . . k + 1,  compute  Tir  At,  Bi,  Ci,  and  factorize  A,. 

(e)  Factorize  Tk+ 1 into  Lk+i,  Uk+\. 

(2)  Solve  Tqx  = b by  flexible  GMRES  preconditioned  using  Algorithm,  3.1. 

Note  that  the  relatively  expensive  step  of  computing  the  BBNS-form  matrices  At,B ,,  C, , T,; 
is  done  only  once. 
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4 Numerical  results 


Here  we  illustrate  the  effectiveness  of  our  method,  and  compare  it  with  some  alternative 
approaches,  by  considering  two  example  mn  x mn  matrices: 


A 


ni+j,nk+l 


c i — k and  j = l, 

\ log((i  - k)2  + ( j - l)2)  otherwise, 


(4.1) 


for  i,  k = 0, 1, . . . , m — 1;  j,  l = 0, 1, . . . ,n-l;ca  constant. 

Bni+jtnk+l=e-(^2+^2),  (4.2) 

for  i,  k = 0, 1, . . . ,m  — 1;  j,  l — 0, 1, . . . ,n  — 1. 

Tables  1 and  2 give  typical  results  for  the  matrices  A and  B respectively.  The  cost 
of  reducing  the  relative  residual  norm  to  a tolerance  of  10~6  is  shown  for  matrices  of 
various  sizes  using  the  following  preconditioners: 

PI  simple  band  preconditioner, 

P2  standard  BBDWT  + band  cut  preconditioner, 

P3  BBDWTPer  + band  cut  preconditioner, 

P4  recursive  BBDWT-based  preconditioner. 

In  each  case  GMRES  was  restarted  after  10  iterations.  V denotes  non-convergence  of 
GMRES(IO).  Unpreconditioned  GMRES(IO)  failed  to  converge  to  the  required  tolerance 
for  any  size  of  matrix,  so  it  is  omitted  from  the  tables. 
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n 

N = mn 

Preconditioned  GMRES 

Direct 

solution 

PI 

P2 

P3 

P4 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

Mflops 

8 

8 

64 

30 

0.65 

49 

1.2 

38 

0.99 

6 

0.32 

0.21 

16 

16 

256 

58 

17 

* 

* 

* 

* 

7 

5.5 

12 

32 

32 

1024 

86 

393 

* 

* 

* 

* 

7 

150 

720 

64 

64 

4096 

* 

* 

* 

* 

* 

* 

7 

6300 

46000 

Tab.  1.  Cost  of  solving  Ax  = b. 
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n 

N = mn 

Preconditioned  GMRES 

Direct 

solution 

PI 

P2 

P3 

P4 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

its. 

Mflops 

Mflops 

8 

8 

64 

8 

0.19 

8 

0.26 

8 

0.26 

4 

0.26 

0.21 

16 

16 

256 

62 

19 

66 

21 

63 

21 

6 

5.6 

12 

32 

32 

1024 

67 

310 

76 

380 

74 

370 

6 

120 

720 

64 

64 

4096 

69 

5000 

78 

6000 

78 

6000 

6 

1700 

46000 

Tab.  2.  Cost  of  solving  Bx  = b. 


Clearly  the  recursive  BBDWT  approach  gives  better  performance  than  the  alternat- 
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ive  preconditioners  that  we  tested  and  offers  substantial  savings  compared  with  direct 
solution. 

5 Conclusion  and  future  work 

We  have  designed  a preconditioning  method  that  exploits  smoothness  between  the  blocks 
of  a class  of  dense  matrices  giving  useful  savings  compared  with  both  direct  solution  and 
preconditioned  GMRES  using  band  preconditioners.  In  the  future  we  plan  to  explore  a 
number  of  ways  of  further  improving  our  methods  including:  (a)  using  a block  DWT, 
in  addition  to  the  BBDWT,  to  exploit  smoothness  within  each  block;  (b)  using  biortho- 
gonal  wavelets  or  multiwavelets  (particularly  the  new  supercompact  Haar  multiwavelets 
presented  in  [2])  to  give  improved  compression;  (c)  preprocessing  the  matrix  to  enhance 
smoothness. 
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